Speech recognition apparatus

ABSTRACT

A noise signal supplied from a microphone by way of an analog to digital converter is Fourier converted to calculate a power spectrum of the noise signal by a Fourier converting section. A system controller compares an average value of the power spectrum of the signal being outputted from the microphone at present and an average value of the power spectrum of a noise signal stored in a noise memory at present with each other. When the system controller determines that the difference between the average value of the power spectrum of the signal being outputted from the microphone at present and the average value of the power spectrum of the noise signal stored in the noise memory at present is higher than a predetermined reference value, it outputs a controlling signal to a sound storing and reading out section to store the signal being outputted from the microphone at present into the noise memory. Consequently, the signal being outputted from the microphone at present is stored into the noise memory in place of the noise signal stored in the noise signal at present.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to a speech recognition apparatus which issuitably applied to recognize speech.

2. Description of the Related Art

Conventionally, speech recognition devices are constructed such that aspeech pattern produced using a characteristic parameter extracted fromspeech input to the device is successively compared with standardpatterns which have been produced in advance using a characteristicparameter such as, for example, a linear forecasting coefficientextracted from speech of an arbitrary or particular speaker. Then,speech corresponding to the standard pattern which is most similar, thatis, smallest in deviation, from the input speech pattern is output fromthe device as a result of recognition of the speech input to the device.

The speech from which the standard patterns described above are producedis normally recorded (stored) in a specific environment as, for example,a sound-proof chamber in which any sound other than the speech, that is,noise, can be eliminated. Then, the standard patterns which are producedfrom the speech which has been recorded in this manner have a good S/Nratio condition. However, speech recognition devices are frequently usedin situations wherein noise (environmental noise) such as, for example,the sound of an engine of an automobile or voices in conversation offoot-traffic is present, than in situations wherein noise sounds areabsent, such as in the manner described above. Accordingly, conventionalspeech recognition devices have a problem to be solved in thatenvironmental noise may be input to the device, in addition to thespeech to be recognized, and will adversely affect the recognition rateof the device.

It is already known that, in order to improve the recognition rate of aspeech recognition device adversely affected by environmental noise, itmay be effective to produce standard patterns in advance which includeenvironmental noise which is likely to occur at the location at whichthe apparatus is to be used. However, it is unlikely that the forecast,additional environmental noise will occur steadily. Rather,environmental noise other than the forecast, additional environmentalnoise may occur, or the forecast, additional environmental noise may notoccur. In this instance, the recognition rate of the apparatus isadversely affected.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a speech recognitionapparatus which always has a high speech recognition rate.

In order to attain the object, according to the present invention, thereis provided a speech recognition apparatus, which comprises an inputtingmeans capable of inputting object speech and noise, a speech storagemeans for storing therein the object speech input from the inputtingmeans, noise storage means for storing therein the noise input from theinputting means, pattern producing means for producing a speech patternfrom the object speech input from the inputting means and for adding theobject speech stored in the speech storage means and the noise stored inthe noise storage means to produce a standard pattern, standard patternstorage means for storing therein the standard pattern produced by thepattern producing means, recognizing means for comparing the speechpattern produced by the pattern producing means with the standardpattern stored in the standard pattern storage means to recognize theobject speech input from the inputting means, and updating means forcausing the noise storage means to successively update the stored noisethereof in response to variations in the noise input from the inputtingmeans.

In the speech recognition apparatus, object speech and noise input fromthe inputting means, such as a microphone, are stored in speech storagemeans and noise storage means, respectively. The pattern producing meansadds the object speech and the noise thus stored to each other toproduce a standard pattern. Further, the pattern producing meansproduces a speech pattern from the object speech input from theinputting means. The recognizing means compares the standard pattern andthe speech pattern with each other to recognize the object speech. Theupdating means causes the noise storage means to successively update thenoise stored therein in response to variations in the noise input fromthe inputting means. Accordingly, with the present speech recognitionapparatus, a standard pattern in which noise, corresponding to theenvironment in which the apparatus is used, is produced. Consequently,the speech recognition apparatus can achieve a high recognition rate.

Preferably, the updating means includes detecting means for periodicallydetecting any variation in the noise input from the inputting means andfor causing the noise storage means to successively update the storednoise thereof in response to variations in the noise detected by thedetecting means. Thus, standard patterns in which a noise signalcorresponding to the environment in which the apparatus is used aresuccessively produced. Consequently, the speech recognition apparatuscan achieve a high recognition rate irrespective of variations in theenvironmental noise. Preferably, the detecting means causes the noisestorage means to update the noise stored therein when the detectingmeans detects a variation in the noise input from the inputting meanswhich is greater than the first predetermined reference value. When thevariation in noise input from the inputting means is greater than thefirst predetermined reference value the detecting means further detectswhether the variation is greater than a second predetermined referencevalue which is higher than the first predetermined reference value. Theupdating means does not cause the noise storage means to update thestored noise when the detecting means detects that the variation of thenoise input from the inputting means is greater than the secondpredetermined reference value. Thus, the noise stored in the noisestorage means is updated each time the variation of the noise input fromthe inputting means exceeds the first predetermined reference value butis less than the second predetermined reference value. Consequently,even when a signal having an excessively high signal level is input byway of the inputting means, such as, for example, when a temporary noiseis input by way of the inputting means or when speech uttered from aspeaker to be recognized is input by way of the inputting means, thenoise stored in the noise storage means is prevented from being updatedin error. Consequently, the noise stored in the noise storage means isalways updated regularly.

The speech recognition apparatus may further comprise reproducing meansfor reproducing the object speech stored in the speech storage means inresponse to a recognition result by the recognizing means. Thus, thespeech recognition apparatus allows a recognition result to be readilyconfirmed by the user of the apparatus by means of the reproducingmeans. Further, the present speech recognition apparatus eliminates thenecessity of providing a memory for storing speech from which arecognition result is confirmed, and consequently, the speechrecognition apparatus can be produced at a reduced cost.

The above and other objects, features and advantages of the presentinvention will become apparent from the following description and theappended claims, taken in conjunction with the accompanying drawing.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a block diagram of a speech recognition apparatus showing apreferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring to FIG. 1, there is shown a speech recognition apparatusaccording to which the present invention. The speech recognitionapparatus shown includes a microphone 1 which converts sound inputthereto, such as speech or noise in the environment in which theapparatus is used, into an electric signal in the form of an analogsignal. An analog to digital converter 2 converts the analog signaloutput by from the microphone 1 into a digital signal. A sound memory 4stores therein sound (a speech signal) which is received from themicrophone 1 by way of the analog to digital converter 2 and a soundstoring and reading out section 3 and from which a recognition patternis to be produced. A noise memory 5 stores therein noise (a noisesignal) received from the microphone 1 by way of the analog to digitalconverter 2 and the sound storing and reading out section 3.

Here, noise is defined as speech and/or sound other than object speechto be recognized by the apparatus.

The sound storing and reading out section 3 is controlled by a systemcontroller 6 so as to store a digital signal output from the analog todigital converter 2 into the speech memory 4 or the noise memory 5.Further, the sound storing and reading out section 3 reads out thespeech signal or noise signal stored in the speech memory 4 or noisememory 5 and outputs the thus read out speech signal or noise signal toa sound adding section 7. Furthermore, the sound storing and reading outsection 3 reads out the speech signal stored in the speech memory 4 inresponse to a result of recognition supplied thereto from a speechrecognizing section 9 by way of the system controller 6 and outputs thethus read out speech signal to a digital to analog converter 12.

The digital to analog converter 12 converts the speech signal in theform of a digital signal read out from the speech memory 4 by the soundstoring and reading out section 3 into an analog signal. A loudspeaker13 converts the analog signal (speech signal) output from the digital toanalog converter 12 into sound and outputs the sound.

The sound adding section 7 adds the speech signal and the noise signalread out respectively from the speech memory 4 and the noise memory 5 bythe sound storing and reading out section 3. A pattern producing section8 extracts a characteristic parameter such as, for example, a linearforecasting coefficient, from a signal output as a result of thecombination of the speech signal and the noise signal from the soundadding section 7 and the sound signal which is received from themicrophone 1 by way of the analog to digital converter 2 and whichrepresents the object sound which includes therein noise and the speechwhich is to be recognized. Further, the pattern producing section 8produces a standard pattern and a speech pattern. A pattern storagesection 10 stores therein the standard pattern output from the patternproducing section 8.

The speech recognizing section 9 compares the speech pattern output fromthe pattern producing section 8 with a standard pattern stored in thepattern storage section 10 to effect speech recognition.

A Fourier transforming section 11 Fourier transforms the digital signaloutput from the analog to digital converter 2 and calculates andsupplies a power spectrum of the signal after Fourier transformation tothe system controller 6. The system controller 6 controls the componentsof the apparatus. For example, the system controller 6 controls thesound storing and reading out section 3 in response to the powerspectrum supplied thereto from the Fourier transforming section 11 and aresult of the recognition by the speech recognizing section 9.

We will now describe the operation of the speech recognition apparatuswhen a speech signal from which a standard pattern is to be produced isto be registered or stored. First, in order to notify the systemcontroller 6 that a standard pattern is to be registered, aregistration/recognition switch (not shown) is switched to theregistration side. Then, in a specific location such as, for example, asound-proof chamber in which sound other than the speech from which astandard pattern is to be produced, that is, noise, can be eliminated,predetermined speech from which a standard pattern is to be produced isinput to the microphone 1. The speech input to the microphone 1 isconverted into an electric signal (speech signal) and output to theanalog to digital converter 2. The speech signal is converted from ananalog signal into a digital signal by the analog to digital converter 2and then supplied to the sound storing and reading out section 3. Thesound storing and reading out section 3 stores the sound signal suppliedthereto from the analog to digital converter 2 in the speech memory 4 inaccordance with a control signal output from the system control section6.

A speech signal of a high S/N ratio from which a standard pattern is tobe produced is registered in this manner.

Now, operation of the speech recognition apparatus when recognizingspeech using the thus registered speech will be described. First, theregistration/recognition switch is switched to the recognition side tonotify the system controller 6 that speech recognition is to beeffected. Then, noise (environmental noise) input to the microphone 1 inthe environment in which the apparatus is used is supplied to the soundstoring and reading out section 3 and the Fourier transforming section11 by way of the analog to digital converter 2. The sound storing andreading out section 3 stores the noise signal supplied thereto from themicrophone 1 by way of the analog to digital converter 2 into the noisememory 5 in accordance with a control signal output from the systemcontroller 6.

Meanwhile, the Fourier transforming section 11 Fourier transforms thenoise signal supplied thereto from the microphone 1 by way of the analogto digital converter 2 and calculates a power spectrum. The systemcontroller 6 calculates, for each predetermined reference period, anaverage value of the power spectrum supplied successively thereto fromthe Fourier transforms section 11. The system controller 6 compares theaverage value of the power spectrum of the signal currently being outputfrom the microphone 1 with the average value of the power spectrum whichwas calculated when the system controller 6 supplied a control signal tothe sound storing and reading out section 3 when it was storing a signaloutput from the microphone 1 into the noise memory 5; that is, with anaverage value of the power spectrum of the noise signal already storedin the noise memory 5.

When the system controller 6 determines that the difference between theaverage value of the power spectrum of the signal being output from themicrophone 1 and the average value of the power spectrum of the noisesignal already stored in the noise memory 5 is greater than apredetermined reference value A, it outputs a control signal to thesound storing and reading out section 3 to store the signal currentlybeing output from the microphone 1 into the noise memory 5. In responseto the control signal thus output from the system controller 6, thesound storing and reading out section 3 stores the signal currentlybeing output from the microphone 1 into the noise memory 5 in place ofthe signal (noise signal) previously stored in the noise memory 5.

Based on the above, if a signal having an excessively high signal levelis input to the microphone 1; such as, for example, when temporary noiseis input to the microphone 1 or when speech uttered from a speaker to berecognized is input to the microphone 1, then the system controller 6will determine that the difference between the average value of thepower spectrum of the signal currently being output from the microphone1 and the average value of the power spectrum of the noise signal storedin the noise memory 5 is greater than the predetermined reference valueA. Consequently, the signal being output from the microphone 1, that is,the temporary noise or the speech uttered from the speaker to berecognized, will be stored into the noise memory 5 in place of thesignal (noise signal) previously stored in the noise memory 5. To avoidthis, when the system controller 6 determines that the differencebetween the average value of the power spectrum of the signal currentlybeing output from the microphone 1 and the average value of the powerspectrum of the noise signal stored in the noise memory 5 is greaterthan another reference value B, which is greater than the firstreference value A (A<B), it determines that temporary noise or speechuttered from a speaker to be recognized has been or is being input tothe microphone 1. In this instance, the system controller 1 does notoutput a control signal to the sound storing and reading out section 3.

In this manner, the noise (noise signal) stored in the noise memory 5 issuccessively updated in accordance with the variations in noise in theenvironment in which the apparatus is used.

Each time or after the stored contents of the noise memory 5 areupdated, the noise signal stored in the noise memory 5 and the speechsignal stored in the speech memory 4 and having a high S/N ratio so asto produce a standard pattern therefrom, are read out by the soundstoring and reading out section 3 and supplied to the sound addingsection 7. The noise signal and the speech signal (of a high S/N ratio)are added to each other at the sound adding section 7 and then suppliedto the pattern producing section 8. At the pattern producing section 8,a characteristic parameter is extracted from the addition signal of thenoise signal and the speech signal (having a high S/N ratio), and astandard pattern is produced from the characteristic parameter. Thestandard pattern thus produced is stored into the pattern storagesection 10.

On the other hand, when object speech to be recognized is input to themicrophone 1 together with the noise in the environment in which theapparatus is placed, it is converted from an analog signal into adigital signal by the analog to digital converter 2 and then output tothe pattern producing section 8. At the pattern producing section 8, acharacteristic parameter is extracted from the speech signal output fromthe analog to digital converter 2 which includes the noise therein, anda speech pattern is produced from the characteristic parameter thusextracted. The speech pattern is output to the speech recognitionsection 9. At the speech recognition section 9, the speech patternoutput from the pattern producing section 8 which includes the noisetherein, and the standard pattern stored in the pattern storage section10 are compared with each other to effect recognition of the speechinput to the microphone 1 which includes the noise therein. The resultof the recognition is output from the speech recognition section 9 tothe system controller 6.

The result of the recognition output from the speech recognition section9 is supplied to the sound storing and reading out section 3 by way ofthe system controller 6. The sound storing and reading out section 3thus reads out the speech signal corresponding to the result of therecognition from the speech memory 4 and supplies the speech signal tothe digital to analog converter 12. The digital to analog converter 12transforms the speech signal corresponding to the result of therecognition from a digital signal into an analog signal and supplies itto the loudspeaker 13. Consequently, the speech which is the result ofthe recognition may be confirmed when it is output from the loudspeaker13.

As described above, and according to the speech recognition apparatus ofthe present invention, since a standard pattern in which the noise inthe environment in which the apparatus is used which is produced inaccordance with variations in the noise, the apparatus achieves animproved recognition rate for speech in which noise is included.

It is to be noted that the sound adding section 7 may be modified suchthat it may add a speech signal stored in the speech memory 4 and anoise signal stored in the noise memory 5 after either the speech signalor the noise signal is weighted.

Having now fully described the invention, it will be apparent to one ofordinary skill in the art that many changes and modifications can bemade thereto without departing from the spirit and scope of theinvention as set forth herein.

What is claimed is:
 1. A speech recognition apparatuscomprising:inputting means for inputting object speech and noise; speechstorage means for storing object speech from said inputting means; noisestorage means for storing noise from said inputting means; patternproducing means for producing a speech pattern from the object speechfrom said inputting means and for adding the object speech stored insaid speech storage means and the noise stored in said noise storagemeans to produce a standard pattern; standard pattern storage means forstoring the standard pattern produced by said pattern producing means;recognizing means for comparing the speech pattern produced by saidpattern producing means with the standard pattern stored in saidstandard pattern storage means to thereby recognize the object speechfrom said inputting means; updating means for causing said noise storagemeans to successively update the noise stored therein in response to avariation between the noise from the inputting means and the noisestored in the noise storage means exceeding a first predeterminedreference value; and means for determining when the variation betweenthe noise input from said inputting means and the noise stored in saidnoise storage means is greater than the first predetermined referencevalue and also greater than a second predetermined reference value whichis higher than the first predetermined reference value, wherein saidupdating means does not update the noise stored in said noise storagemeans when said variation between the noise input from said inputtingmeans and the noise stored in said noise storage means is greater thanthe second predetermined reference value.
 2. A speech recognitionapparatus according to claim 1, further comprising reproducing means forreproducing the object speech stored in said speech storage means inresponse to a recognition of object speech by said recognition means.